Hartford
Distribution-free inference for LightGBM and GLM with Tweedie loss
Manna, Alokesh, Sett, Aditya Vikram, Dey, Dipak K., Gu, Yuwen, Schifano, Elizabeth D., He, Jichao
Prediction uncertainty quantification is a key research topic in recent years scientific and business problems. In insurance industries (\cite{parodi2023pricing}), assessing the range of possible claim costs for individual drivers improves premium pricing accuracy. It also enables insurers to manage risk more effectively by accounting for uncertainty in accident likelihood and severity. In the presence of covariates, a variety of regression-type models are often used for modeling insurance claims, ranging from relatively simple generalized linear models (GLMs) to regularized GLMs to gradient boosting models (GBMs). Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty under relatively weak assumptions of exchangeability, and has been well studied under the classic linear regression setting. In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss. Using regularized Tweedie GLM regression and LightGBM with Tweedie loss, we demonstrate conformal prediction performance with these non-conformity measures in insurance claims data. Our simulation results favor the use of locally weighted Pearson residuals for LightGBM over other methods considered, as the resulting intervals maintained the nominal coverage with the smallest average width.
A Variational Information Theoretic Approach to Out-of-Distribution Detection
Mondal, Sudeepta, Jiang, Zhuolin, Sundaramoorthi, Ganesh
We present a theory for the construction of out-of-distribution (OOD) detection features for neural networks. We introduce random features for OOD through a novel information-theoretic loss functional consisting of two terms, the first based on the KL divergence separates resulting in-distribution (ID) and OOD feature distributions and the second term is the Information Bottleneck, which favors compressed features that retain the OOD information. We formulate a variational procedure to optimize the loss and obtain OOD features. Based on assumptions on OOD distributions, one can recover properties of existing OOD features, i.e., shaping functions. Furthermore, we show that our theory can predict a new shaping function that out-performs existing ones on OOD benchmarks. Our theory provides a general framework for constructing a variety of new features with clear explainability.
The Value of Information in Multi-Scale Feedback Systems
Di Felice, Louisa Jane, Diaconescu, Ada, Zahadat, Payam, Mellodge, Patricia
Complex adaptive systems (CAS) can be described as systems of information flows dynamically interacting across scales in order to adapt and survive. CAS often consist of many components that work towards a shared goal, and interact across different informational scales through feedback loops, leading to their adaptation. In this context, understanding how information is transmitted among system components and across scales becomes crucial for understanding the behavior of CAS. Shannon entropy, a measure of syntactic information, is often used to quantify the size and rarity of messages transmitted between objects and observers, but it does not measure the value that information has for each specific observer. For this, semantic and pragmatic information have been conceptualized as describing the influence on an observer's knowledge and actions. Building on this distinction, we describe the architecture of multi-scale information flows in CAS through the concept of Multi-Scale Feedback Systems, and propose a series of syntactic, semantic and pragmatic information measures to quantify the value of information flows. While the measurement of values is necessarily context-dependent, we provide general guidelines on how to calculate semantic and pragmatic measures, and concrete examples of their calculation through four case studies: a robotic collective model, a collective decision-making model, a task distribution model, and a hierarchical oscillator model. Our results contribute to an informational theory of complexity, aiming to better understand the role played by information in the behavior of Multi-Scale Feedback Systems.
Subtitling Your Life
A little over thirty years ago, when he was in his mid-forties, my friend David Howorth lost all hearing in his left ear, a calamity known as single-sided deafness. "It happened literally overnight," he said. "My doctor told me, 'We really don't understand why.' " At the time, he was working as a litigator in the Portland, Oregon, office of a large law firm. His hearing loss had no impact on his job--"In a courtroom, you can get along fine with one ear"--but other parts of his life were upended. The brain pinpoints sound sources in part by analyzing minute differences between left-ear and right-ear arrival times, the same process that helps bats and owls find prey they can't see.
Wearable ring translates sign language into text
American Sign Language (ASL) has long enabled real-time conversations for English-speaking people who are deaf and hard-of-hearing. But discussions often face significant lags when one or more conversants aren't fluent in the language system. But by combining deep learning artificial intelligence and micro-sonar technologies, researchers at Cornell University are developing a new wearable to help overcome the communication barriers. With further refinement, SpellRing may one day facilitate entire conversations regardless of your ASL comprehension skills. ASL's earliest iterations developed in the early 18th century at the American School for the Deaf in Hartford, Connecticut.
Distributed Multi-Agent Reinforcement Learning with One-hop Neighbors and Compute Straggler Mitigation
Wang, Baoqian, Xie, Junfei, Atanasov, Nikolay
Most multi-agent reinforcement learning (MARL) methods are limited in the scale of problems they can handle. With increasing numbers of agents, the number of training iterations required to find the optimal behaviors increases exponentially due to the exponentially growing joint state and action spaces. This paper tackles this limitation by introducing a scalable MARL method called Distributed multi-Agent Reinforcement Learning with One-hop Neighbors (DARL1N). DARL1N is an off-policy actor-critic method that addresses the curse of dimensionality by restricting information exchanges among the agents to one-hop neighbors when representing value and policy functions. Each agent optimizes its value and policy functions over a one-hop neighborhood, significantly reducing the learning complexity, yet maintaining expressiveness by training with varying neighbor numbers and states. This structure allows us to formulate a distributed learning framework to further speed up the training procedure. Distributed computing systems, however, contain straggler compute nodes, which are slow or unresponsive due to communication bottlenecks, software or hardware problems. To mitigate the detrimental straggler effect, we introduce a novel coded distributed learning architecture, which leverages coding theory to improve the resilience of the learning system to stragglers. Comprehensive experiments show that DARL1N significantly reduces training time without sacrificing policy quality and is scalable as the number of agents increases. Moreover, the coded distributed learning architecture improves training efficiency in the presence of stragglers.
Large-Scale Dense 3D Mapping Using Submaps Derived From Orthogonal Imaging Sonars
McConnell, John, Collado-Gonzalez, Ivana, Szenher, Paul, Englot, Brendan
3D situational awareness is critical for any autonomous system. However, when operating underwater, environmental conditions often dictate the use of acoustic sensors. These acoustic sensors are plagued by high noise and a lack of 3D information in sonar imagery, motivating the use of an orthogonal pair of imaging sonars to recover 3D perceptual data. Thus far, mapping systems in this area only use a subset of the available data at discrete timesteps and rely on object-level prior information in the environment to develop high-coverage 3D maps. Moreover, simple repeating objects must be present to build high-coverage maps. In this work, we propose a submap-based mapping system integrated with a simultaneous localization and mapping (SLAM) system to produce dense, 3D maps of complex unknown environments with varying densities of simple repeating objects. We compare this submapping approach to our previous works in this area, analyzing simple and highly complex environments, such as submerged aircraft. We analyze the tradeoffs between a submapping-based approach and our previous work leveraging simple repeating objects. We show where each method is well-motivated and where they fall short. Importantly, our proposed use of submapping achieves an advance in underwater situational awareness with wide aperture multi-beam imaging sonar, moving toward generalized large-scale dense 3D mapping capability for fully unknown complex environments.
Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments
Imai, Kosuke, Nakamura, Kentaro
In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence. Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike the existing methods, our proposed approach eliminates the need to learn causal representation from the data and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings, in which the treatment feature is based on human perception rather than is assumed to be fixed given the treatment object. The proposed methodology is also applicable to text reuse where an LLM is used to regenerate the existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over the state-of-the-art causal representation learning algorithms.